2,793 research outputs found

    Optimal Lower Bound for Itemset Frequency Indicator Sketches

    Full text link
    Given a database, a common problem is to find the pairs or kk-tuples of items that frequently co-occur. One specific problem is to create a small space "sketch" of the data that records which kk-tuples appear in more than an ϵ\epsilon fraction of rows of the database. We improve the lower bound of Liberty, Mitzenmacher, and Thaler [LMT14], showing that Ω(1ϵdlog(ϵd))\Omega(\frac{1}{\epsilon}d \log (\epsilon d)) bits are necessary even in the case of k=2k=2. This matches the sampling upper bound for all ϵ1/d.99\epsilon \geq 1/d^{.99}, and (in the case of k=2k=2) another trivial upper bound for ϵ=1/d\epsilon = 1/d.Comment: 3 page

    The Noisy Power Method: A Meta Algorithm with Applications

    Full text link
    We provide a new robust convergence analysis of the well-known power method for computing the dominant singular vectors of a matrix that we call the noisy power method. Our result characterizes the convergence behavior of the algorithm when a significant amount noise is introduced after each matrix-vector multiplication. The noisy power method can be seen as a meta-algorithm that has recently found a number of important applications in a broad range of machine learning problems including alternating minimization for matrix completion, streaming principal component analysis (PCA), and privacy-preserving spectral analysis. Our general analysis subsumes several existing ad-hoc convergence bounds and resolves a number of open problems in multiple applications including streaming PCA and privacy-preserving singular vector computation.Comment: NIPS 201

    Respiratory Medication Adherence : Toward a Common Language and a Shared Vision

    Get PDF
    Part of this work, conducted by E. Van Ganse, has been performed in the context of the ASTRO-LAB project, which received funding from the European Community's 7th Framework (FP7/2007-2013) under grant agreement no. 282593. Teva supported the meeting costs at which the concepts in this paper were discussed by the co-authors and the open access publication fee for this article. The authors had full editorial control over the ideas presented.Peer reviewedPublisher PD

    The Sketching Complexity of Graph and Hypergraph Counting

    Full text link
    Subgraph counting is a fundamental primitive in graph processing, with applications in social network analysis (e.g., estimating the clustering coefficient of a graph), database processing and other areas. The space complexity of subgraph counting has been studied extensively in the literature, but many natural settings are still not well understood. In this paper we revisit the subgraph (and hypergraph) counting problem in the sketching model, where the algorithm's state as it processes a stream of updates to the graph is a linear function of the stream. This model has recently received a lot of attention in the literature, and has become a standard model for solving dynamic graph streaming problems. In this paper we give a tight bound on the sketching complexity of counting the number of occurrences of a small subgraph HH in a bounded degree graph GG presented as a stream of edge updates. Specifically, we show that the space complexity of the problem is governed by the fractional vertex cover number of the graph HH. Our subgraph counting algorithm implements a natural vertex sampling approach, with sampling probabilities governed by the vertex cover of HH. Our main technical contribution lies in a new set of Fourier analytic tools that we develop to analyze multiplayer communication protocols in the simultaneous communication model, allowing us to prove a tight lower bound. We believe that our techniques are likely to find applications in other settings. Besides giving tight bounds for all graphs HH, both our algorithm and lower bounds extend to the hypergraph setting, albeit with some loss in space complexity

    On the Power of Adaptivity in Sparse Recovery

    Get PDF
    The goal of (stable) sparse recovery is to recover a kk-sparse approximation xx* of a vector xx from linear measurements of xx. Specifically, the goal is to recover xx* such that ||x-x*||_p <= C min_{k-sparse x'} ||x-x'||_q for some constant CC and norm parameters pp and qq. It is known that, for p=q=1p=q=1 or p=q=2p=q=2, this task can be accomplished using m=O(klog(n/k))m=O(k \log (n/k)) non-adaptive measurements [CRT06] and that this bound is tight [DIPW10,FPRU10,PW11]. In this paper we show that if one is allowed to perform measurements that are adaptive, then the number of measurements can be considerably reduced. Specifically, for C=1+epsC=1+eps and p=q=2p=q=2 we show - A scheme with m=O((1/eps)kloglog(neps/k))m=O((1/eps)k log log (n eps/k)) measurements that uses O(logkloglog(neps/k))O(log* k \log \log (n eps/k)) rounds. This is a significant improvement over the best possible non-adaptive bound. - A scheme with m=O((1/eps)klog(k/eps)+klog(n/k))m=O((1/eps) k log (k/eps) + k \log (n/k)) measurements that uses /two/ rounds. This improves over the best possible non-adaptive bound. To the best of our knowledge, these are the first results of this type. As an independent application, we show how to solve the problem of finding a duplicate in a data stream of nn items drawn from 1,2,...,n1{1, 2, ..., n-1} using O(logn)O(log n) bits of space and O(loglogn)O(log log n) passes, improving over the best possible space complexity achievable using a single pass.Comment: 18 pages; appearing at FOCS 201
    corecore